Cell-specific and shared regulatory elements control a multigene locus active in mammary and salivary glands

Regulation of high-density loci harboring genes with different cell-specificities remains a puzzle. Here we investigate a locus that evolved through gene duplication and contains eight genes and 20 candidate regulatory elements, including one super-enhancer. Casein genes (Csn1s1, Csn2, Csn1s2a, Csn1s2b, Csn3) are expressed in mammary glands, induced 10,000-fold during pregnancy and account for 50% of mRNAs during lactation, Prr27 and Fdcsp are salivary-specific and Odam has dual specificity. We probed the function of 12 candidate regulatory elements, individually and in combination, in the mouse genome. The super-enhancer is essential for the expression of Csn3, Csn1s2b, Odam and Fdcsp but largely dispensable for Csn1s1, Csn2 and Csn1s2a. Csn3 activation also requires its own local enhancer. Synergism between local enhancers and cytokine-responsive promoter elements facilitates activation of Csn2 during pregnancy. Our work identifies the regulatory complexity of a multigene locus with an ancestral super-enhancer active in mammary and salivary tissue and local enhancers and promoter elements unique to mammary tissue.


Statistics
For all statistical analyses, confirm that the following items are present in in the figure legend, table legend, main text, or or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of of all covariates tested A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g. means) or or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g. Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in in published literature, software must be be made available to to editors and reviewers. We We strongly encourage code deposition in in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of of data All manuscripts must include a data availability statement This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or or web links for publicly available datasets -A description of of any restrictions on on data availability -For clinical datasets or or third party data, please ensure that the statement adheres to to our policy Lothar Hennighausen Jul 27, 2023 ChIP-seq and RNA-seq data in in GEO were downloaded using Sratoolkit (version 2.10.9) and newly generated ChIP-seq and RNA-seq reads were collected using HCS 3.4.0 software for HiSeq 3000 and NovaSeq control software v1.7.5 for NovaSeq 6000. Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative. 15 mutant mouse lines with deletions of regulatory elements were generated. No statistical methods were used to determine sample size. In general, at least three independent replicates were performed in all experiments. When possible, we have aimed for the replication of the animal experiments in at least two different cohorts. The sample size used for each experiment is indicated at the corresponding figure legend in the manuscript.
None of data were excluded in the data analysis.
The number of independent replicates for each experiment is indicated at the corresponding figure legend in the manuscript. In general, at least three independent replicates and two independent ChIP-seq replicates were performed.
In all animal studies, groups were allocated randomly. Age and gender-matched animals were used in all the experiments.
For all animal studies, the investigators were blind to group allocation. Blinding was not applicable to the rest of experiments.
5-10 ug of antibodies were added in 1 mg of total proteins (1ml solution). Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication.
No wild animals were used in the study.
Mammary gland tissues were collected from females mice and salivary gland tissues were collected from male mice.
Mammary gland tissues from specific stages during pregnancy and lactation were harvested, and stored at -80°C until being used in experiments.
All animals were housed and handled according to the guidelines of the Animal Care and Use Committee (ACUC) of the NIH (https:// oacu.oir.nih.gov) and all animal experiments were approved by the ACUC of National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK, MD) and performed under the NIDDK animal protocol K089-LGP-17.
Report on the source of all seed stocks or other plant material used. If applicable, state the seed stock centre and catalogue number. If plant specimens were collected from the field, describe the collection location, date and sampling procedures.
Describe the methods by which all novel plant genotypes were produced. This includes those generated by transgenic approaches, gene editing, chemical/radiation-based mutagenesis and hybridization. For transgenic lines, describe the transformation method, the number of independent lines analyzed and the generation upon which experiments were performed. For gene-edited lines, describe the editor used, the endogenous sequence targeted for editing, the targeting guide RNA sequence (if applicable) and how the editor was applied.
Describe any authentication procedures for each seed stock used or novel genotype generated. Describe any experiments used to assess the effect of a mutation and, where applicable, how potential secondary effects (e.g. second site T-DNA insertions, mosiacism, off-target gene editing) were examined. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE231441 April 2023 April 2023 Genome browser session (e.g. UCSC)

Methodology
Replicates Sequencing depth Antibodies Peak calling parameters Data quality delCsn2-P-E123-B_L1_PolII_rep1 delCsn2-P-E123-B_L1_PolII_rep2 delCsn2-P-E123-B_L1_STAT5_rep1 delCsn2-P-E123-B_L1_STAT5_rep2 WT_virMEC_NFIB WT_virMEC_PolII no longer applicable delCsn2-P-E123_L1_GR, delCsn3-E1_L1_NFIB and delOdam_L1_NFIB samples have one data set because they were not critical for the study. WT_virMEC_H3K27ac, WT_virMEC_NFIB and WT_virMEC_PolII samples have one data set because several mice were used for one ChIP-seq. For all other ChIP-seq experiments more than two replicates were conducted.
All Sequencing was done as 51bp single end sequence. Sequencing was done to achieve > 30 million reads per biological replicate.
> 20000 peaks for transcription factors and > 100000 peaks for histone markers by q-value (< 0.001 for TFs, 0.1 or 0.5 for histone markers) were at 5% FDR and above 4-fold enrichment.